首页> 外文OA文献 >PROBER: Ad-Hoc Debugging of Extraction and Integration Pipelines
【2h】

PROBER: Ad-Hoc Debugging of Extraction and Integration Pipelines

机译:pROBER:提取和集成管道的ad-Hoc调试

摘要

Complex information extraction (IE) pipelines assembled by plumbing togetheroff-the-shelf operators, specially customized operators, and operators re-usedfrom other text processing pipelines are becoming an integral component of mosttext processing frameworks. A critical task faced by the IE pipeline user is torun a post-mortem analysis on the output. Due to the diverse nature ofextraction operators (often implemented by independent groups), it is timeconsuming and error-prone to describe operator semantics formally oroperationally to a provenance system. We introduce the first system that helpsIE users analyze pipeline semantics and infer provenance interactively whiledebugging. This allows the effort to be proportional to the need, and to focuson the portions of the pipeline under the greatest suspicion. We present ageneric debugger for running post-execution analysis of any IE pipelineconsisting of arbitrary types of operators. We propose an effective provenancemodel for IE pipelines which captures a variety of operator types, ranging fromthose for which full or no specifications are available. We present a suite ofalgorithms to effectively build provenance and facilitate debugging. Finally,we present an extensive experimental study on large-scale real-worldextractions from an index of ~500 million Web documents.
机译:通过将现成的运算符,专门定制的运算符以及从其他文本处理管线重用的运算符组合在一起组成的复杂信息提取(IE)管道正成为大多数文本处理框架的组成部分。 IE管道用户面临的一项关键任务是对输出进行事后分析。由于提取运算符的多样性(通常由独立的小组来实现),因此将一种运算符的语义形式化或操作性地描述到一个出处系统上既费时又容易出错。我们介绍了第一个帮助IE用户在调试时分析管道语义并交互推断出处的系统。这样可以使工作与需求成比例,并且可以最大程度地集中精力关注管道的各个部分。我们提供了一个通用调试器,用于运行由任意类型的运算符组成的任何IE管道的执行后分析。我们为IE管道提出了一个有效的出处模型,该模型可以捕获各种各样的操作员类型,从那些可用的完整规范或不提供规范的类型开始。我们提出了一套算法来有效地建立来源并促进调试。最后,我们对约5亿个Web文档的索引进行了大规模的真实世界提取的广泛实验研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号